Flexible Environment

1 Overview

In cpprb version 8 and newer, you can store any number of environments (aka. observation, action, etc.).

For example, you can add your special environments like next_next_obs, second_reward, and so on.

These environments can take multi-dimensional shape (e.g. 3, (4,4), (84,84,4)), and any numpy data type.

1.1 __init__

In order to construct replay buffers, you need to specify the second parameter of their constructor, env_dict.

The env_dict is a dict whose keys are environment name and whose values are dict describing their properties.

The following table is supported properties and their default values.

key description type default value
shape shape (size of each dimension) int or array like of int 1
dtype data type numpy.dtype default_dtype in constructor or numpy.single

1.2 add

When add -ing environments to the replay buffer, you have to pass them by keyword arguments (aka. key=value style). If your environment name is not a syntactically valid identifier, you can still create dictionary first, then unpack the dictionary by ** operator (e.g. rb.add(**kwargs)).

1.3 sample

sample returns dict with keys of environments’ name and with values of sampled ones.

2 Example Usage

from cpprb import ReplayBuffer
import numpy as np

buffer_size = 32

rb = ReplayBuffer(buffer_size,{"obs": {"shape": (4,4)},
                               "act": {"shape": 1},
                               "rew": {},
                               "next_obs": {"shape": (4,4)},
                               "next_next_obs": {"shape": (4,4)},
                               "done": {},
                               "my_important_info": {"dtype": {np.short}}})

for _ in range(100):


3 Notes

priorities, weights, and indexes for PrioritizedReplayBuffer are special environments and are automatically set.

4 Technical Detail

Internally, these flexible environments are implemented with (cython version of) numpy.ndarray. They were implemented with C++ code in older than version 8, which had trouble in flexibilities of data type and the number of environment. (There was a dirty hack to put all extra environments into act which was not treat specially.)